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A A/IAN-MACHINE INTERFACE BASED ON 3-D POSITIONS OF THE HUMAN 
BODY 

FIELD OF THE INVENTION 

The Invention relates to a man-machine interface wherein three-dimensional 
5 positions of parts of the body of a user is detected and used as an input to a 
computer. 

BACKGROUND OF THE INVENTION 

In US 2002/0036617, a method and an apparatus is disclosed for inputting position, 
attitude (orientation) or other object characteristic data to computers for the purpose 

10 of Computer Aided learning, Teaching, Gaming. Toys, Simulations. Aids to the 
disabled, Word Processing and other applications. Preferred embodiments utilize 
eJectro-optlcal sensors, and particularly TV cameras for provision of optically inputted 
data from specialized datum's on objects and/or natural features of objects. Objects 
can be both static and in motion from which individual datum positions and 

1 5 movements can be derived also with respect to other objects both fixed and moving. 
SUMMARY OF THE INVENTION 

According to the present invention, an electronic system is provided for determining 
three-dimensional positions within a measuring volume, comprising at least one 
electronic camera for recording of at least two images with different viewing angles of 
20 the measuring volume, and an electronic processor that is adapted for real-time 
processing of the at least two images for determination of three-dimensional 
positions in the measuring volume of selected objects in the Images. 

In a preferred embodiment of the invention, the electronic system comprises one 
electronic camera for recording images of the measuring volume, and an optical 
25 system positioned In front of the camera for interaction with light from the measuring 
volume in such a way that the at least two images with different viewing angles of the 
measuring volume are formed in the camera. 

Positions of points in the measurement volume may be determined by simple 
geometrical calculations, such as by triangulatlon. 

30 The optical system may comprise optical elements for reflection, deflection, refraction 
^ or diffraction of light from the measurement volume for formation of the at least two 

Images of the measurement volume in the camera. The optical elements may 
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comprise mirrors, lenses, prisms, diffractive optical elements, such as holographic 
optical elements, etc, for formation of the at least two images. 

Preferably, the optical system comprises one or more mirrors for deflection of fight 
from the measurement volume for formation of the at least two images of the 
5 measurement volume In the camera. 

Recording of the at least two images with a single camera has the advantages that 
the images are recorded simultaneously so that further synchronization of image 
recording is not needed. Further, since recordings are performed with the same 
optical system, the images are subjected to substantially identical color deviations. 
10 optical distortion, etc. so that, substantially, mutual compensation of the images is not 
needed. 

In a preferred embodiment of the invention, the optical system is symmetrical about a 
symmetry plane, and the optical axis of the camera substantially coincides with the 
symmetry plane so that all characteristics of the Images are substantially identical 
15 substantially eliminating a need for subsequent matching of the images. 

In a preferred embodiment of the invention, the system is calibrated so that image 
forming distortions of the camera may be compensated whereby a low cost digital 
camera, e.g. a web camera, may be incorporated In the system, since after 
calibration, the Images of the camera can be used for accurate determinations of 
20 three-dimensional positions In the measurement volume although the camera itself 
provides images with significant geometrical distortion. For example today's web 
cameras exhibit app. 10 - 12 % distortion. After calibration, the accuracy of positions 
determined by the present system utilizing a low cost web camera with 640 * 480 
pixels is app. 1 %. Accuracy is a function of pixel resolution. 

25 Preferably, calibration is performed by illuminating a screen by a projector with good 
quality optics displaying a known calibration pattern. I.e. comprising a set of points 
with wen-known three-dimensional positions on the screen. 

For example In an embodiment with one camera and an optical system for formation 
of stereo Images in the camera, each point in the measurement volume lies on two 

30 intersecting line of sights, each of which intersects a respective one of the images of 
the camera at a specific pixel. Camera distortion, tilt. skew. etc. displace the line of 
sight to another pixel than the -Ideal" pixel, i.e. the intersected pixel without camera 
distortion and inaccurate camera position and orientation. Based on the calibration 
and the actual intersected pixel, the 'Wear pixel is calculated, e.g. by table look-up, 

35 and accurate line of sights for each pixel in each of the images are calculated, and 
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the three-dimensional position of the point in question is calculated by triangulation of 
the calculated line of sights. 

The processor may further be adapted for recognizing predetermined objects, such 
as body parts of 9 human body, for example for determining three-dimensional 
5 positions of body parts In relation to each other, e.g. by determining human body joint 
angles. 

In a preferred embodiment of the present invention colors are recognized by table 
look-up, the table entries being color values of a color space, such as RGB-values, or 
corresponding values of another color space, such as the C1E 1976 L*a*b* color 
10 space, the CIE 1976 LVv* color space, the CIELCH (L*C"h«>) color space, eta 

8 bit RGB values create a 24 bit entry word, and with a one bit output value, the table 
will be a 16 iwblt table, which is adequate with present day's computers. The output 
values may be one if the entry value indicates the color to be detected, and zero if 
not. 

1 5 Skin color detection may be used for detection of positions of a user's head, hands, 
and eventual other exposed parts of the body. Further, the user may wear patches of 
specific colors and/or shapes that allow identification of a specific patch and three- 
dimensional position determination of the patch. 

The user may wear retro-reflective objects to be Identified by the system and their 
20 three-dimensional position may be determined by the system. 

The positions and orientations of parts of a usef s body may be used as input data to 
a computer, e.g. as e substitution for or a supplement to the well-known keyboard 
and mouse/trackbalVjoystlck computer interface. For example, the execution of a 
computer game may be made dependent on user body positioning and movement 
25 making the game perception more "rear. Positions and orientations of bodies of 
more than one user may also be detected by the system according to the present 
invention and used as input date to a computer. e.g. for Interaction in a computer 
game, or. for co-operation e.g. in computer simulations of e.g. space craft missions, 
etc. 

30 Positions and orientations of parts of a user's body may also be used as input date to 
a computer monitoring a user performing certain exercises, for example physical 
rehabilitation after acquired brain damage, a patient re-training after surgery, an 
athlete training for an athletic meeting, etc. The recorded positions and orientations 
may be compared with desired positions and orientations and feedback may be 
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provided to the user signaling his or her performance. Required improvements may 
be suggested by the system. For exampfe, physiotherapeutic parameters may be 
calculated by the system based on determined positions of specific parts of the body 
of the user. 

5 Feedback may be provides as sounds and/or images. 

Three-dimensional positions are determined In real time, i.e. a user of the system 
perceives Immediate response by the system to movement of his or her body. For 
example, positions of 13 points of the body may be determined 26 times pr. second- 
Preferably, three-dimensional position determination and related calculations of body 
1 0 positions and orientations are performed once for each video frame of camera, i.e. 60 
times pr. second with today's video cameras. 

BRIEF DESCRIPTION OF THE DRAWINGS 

In the following, exemplary embodiments of the Invention wilt be further explained 
with reference to the drawing wherein: 

1 5 Fig, 1 Illustrates schematically a man-machine Interface acoordlng to the present 
invention, 

Fig. 2 illustrates schematically a sensor system according to the present invention, 

Fig. 3 illustrates schematically a calibration set-up for the system according to the 
present invention, 

20 Fig. 4 illustrates the functions of various parts of a system according to the present 
invention. 

Fig. 5 illustrates schematically an Image feature extraction process, 

Fig. 6 illustrates schematically 3D acquisition, and 

Fig. 7 illustrates schematically a 3D tracking process. 

25 DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

In many systems the interaction between a human operator or user and a computer 
is central. The present invention relates to such a system, where the user interface 
comprises a 3D Imaging system facilitating monitoring e.g. the movements of the 
user or other objects in real time. 

30 It Is known that it is possible to obtain stereo images with one camera and an optical 
system in front of the lens of the camera. For example, the optical system may form a 
pair of images In the camera with different viewing angles, thus forming stereoscopic 
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Images- The different viewing angles of the two images provide information about the 
distance from the camera of points that appear In both Images. The distance may be 
determined geometrically. e.g. by ^angulation. The accuracy of the distance 
determination depends on the focal length of the camera lens, the distance between 
5 the apparent focal points created by the optical system in front of the camera, and 
also on the geometric distortion created by tilt. skew. etc. of the camera, the lens of 
the camera and the optical system In front of It and the Image sensor In the camera. 
Typically, the image sensor is an Integrated circuit, which is produced using precise 
lithographlcal methods. Typlcalty. the sensor comprises an array of light sensitive 
10 cells so-called pixels. e.g. an array of 640*480 pixels. As a result of the lithographic 
process, the array Is very uniform and the position of each pixel Is accurately 
controlled. The position uncertainty is kept below a fraction of a pixel. This means 
that the geometrical distortion in the system according to the invention is mainly 
generated by the optical components of the system. 

1S It is well known how to compensate geometric distortion by calibration of a lens 
based on a few images taken with a known static Image pattern placed in different 
parts of the scene. The result of this calibration is an estimate of key optical 
parameters of the system that are Incorporated in formulas used for calculations of 
positions taking the geometrical distortion of the system Into account. The 

20 parameters are typically the focal length and coefficients in a polynomial 

approximation that transforms a plane Into another plane. Such a method may be 
applied to each image of the present system. 

It is however preferred to apply a novel and inventive calibration method to the 
system. Assume that an image is generated wherein the physical position of each 

25 pixel is known and each pixel is like a lighthouse emitting its position In a code. If 

such an Image were placed In front of the camera of the present system covering the 
measurement volume then each pixel in the camera would receive information, which 
could be used to calculate the actual line of sight. The advantage of this approach is 
that as long as the focal point of the camera lens can be considered a point, then 

30 complete compensation for the geometric distortion Is possible. So a low cost camera 
with a typical geometrical distortion of the lens and the optical system positioned in 
front of the camera of e.g. 1 2 % may be calibrated to obtain an accuracy of the 
system that is determined by the accuracy of the sensor in the camera. 
The advantage of using a single camera to obtain stereo Images is that the Images 

35 are captured simultaneously and with the same focal length of the lens, as well as 
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the same spectral response, gain and most other parameters of the camera. The 
interfacing is simple and no synchronisation of more cameras is required. Since the 
picture is effectively split up in two by the optical system in front of the camera the 
viewing angle is halved. A system with a single camera will make many interesting 
5 applications feasible, both due to the low cost of the camera system and the 
substantially eliminated image matching requirements. It is expected that, in the 
future, both the resolution of PC cameras and the PC processing power will steadily 
increase over time further Increasing the performance of the present system. 
Fig. 1 1llustrates schematically an embodiment of a man-machine Interlace 1 
10 according to the present invention. Thesystem 1 comprises three main components: 
an optical system 5. a camera 6 and an electronic processor 7. The optical system 5 
and the camera 6 in combination are also denoted the sensor system 4. 
During operation of the system 1. objects 2 in the measurement volume, such as 
persons or props, are detected by the sensor system 4. The electronic processor 7 
15 processes the captured images of the objects 2 and maps them to a simple 3D 
hierarchical model of the Heal World Object' 2 from which 3D model data (like 
angles between Joints in a person, or x. y. z-posltton and rotations of Joints) are 
extracted and can be used by electronic applications 8 e.g. for Computer Control. 
Fig. 2 illustrates one embodiment the sensor system 4 comprising a web cam 12 and 
20 four mirrors 14. 16. 18. 20. The four mirrors 14. 16. 18. 20 and the web cam 12 lens 
create two images of the measurement volume at the web cam sensor so that three- 
dimensional positions of points in the measurement volume 22 may be determined 
by ^angulation. The large mirrors 18. 20 are positioned substantially perpendicular 
to each other. The camera 12 is positioned so that its optical axis Is horizontal, and in 
25 the three-dimensional coordinate system 24, the y-axls 26 is horizontal and parallel 
to a horizontal row of pixels in the web cam sensor, the x-axis 28 is vertical and 
parallel to a vertical column of pixels in the web cam sensor, and the z-axls points 30 
in the direction of the measurement volume. The position of the centre of the 
coordinate system is arbitrary. Preferably, the sensor system 4 Is symmetrical around 
30 a vertical and a horizontal plane. 

In another embodiment of the invention, real cameras may substitute the virtual 
cameras 12a. 12b. i.e. the mirrored Images 12a. 12b of the camera 12. 
As illustrated in Fig. 3. during calibration, a vertical screen 32 is positioned in front of 
the sensor system 4 in the measurement volume 22 substantially perpendicular to 
35 the optical axis of the web cam 12. and a projector 34 generates a calibration image 
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wlth known geometries on the screen. Position determinations of specific points In 
the calibration image are made by the system at two different distances of the screen 
from the camera whereby the geometrical parameters of the system may be 
determined. Based on the calibration, the lines of sight for each pixel of each of the 
5 images are determined, and e.g. the slopes of the line of sights are stored in a table. 
The position of a point P in the measurement volume is determined by trianguiation 
of the respective line of sights. In general, the two lines of sights will not intersect in 
space because of the quantisation of the image into a finite number of pixels. 
However, they will get very close to each other, and the distance between the lines of 
10 sights will have a minimum at the point P. If this minimum distance is less than a 
threshold determined by the quantisation as determined by the pixel resolution, the 
coordinates of P is determined as the point of minimum distance between the 
respective line of sights. 

Preferably, a projector generates the calibration image with at least ten times less 
15 geometrical distortion than the system. 

In a preferred embodiment of the Invention, the calibration Image is a black and white 
image, and more preferred the calibration image comprises one black section and 
one white section preferably divided by a horizontal borderline or a vertical 
borderline. 

20 The calibration method may comprise sequentially projecting a set of calibration 
Images onto the screen for example starting with a black and white calibration image 
with a horizontal borderline at the top, and sequentially projecting calibration images 
moving the borderline downwards a fixed number of calibration image pixels. e.g. by 
1 calibration image pixel. 

25 Each camera pixel is assigned a count value that is stored in an array In a processor. 
For each calibration Image displayed on the screen the pixel count value is 
incremented by one if the corresponding camera pixel -views" a black screen. During 
calibration an image of the borderline sweeps the camera sensor pixels, and after 
completion of a sweep, the count values contain the required Information of which 

30 part of the screen is Imaged onto which camera pixels. 

This procedure is repeated with a set of black and white calibration images with a 
vertical borderline that is swept across the screen, and a second pixel count value is 
assigned to each camera pixel that is stored in a second array in the processor. 
Again for each calibration Image displayed on the screen the second pixel count 
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value is Incremented by one If the corresponding camera pixel "views" a black 
screen. 

Thus, one sweep is used for calibration of the x-component and the other sweep Is 
used for calibration of the y-component so that the x- and y-component are calibrated 
5 independently. 

Before translating the first and second count values Into corresponding line of sights 
for each camera pixel, It is preferred to process the count values. For example, 
anomalies may occur caused, e.g. by malfunctioning projector pixels or camera 
pixels or by dust on optical parts. A filter may detect deviations of the count values 
10 from a smooth count value surface, and for example a pixel count value deviating 
more than 50 % from its neighbouring pixel count values may be substituted by an 
average of surrounding pixel count values. 

Further, at the edges of the camera sensor, the corresponding array of count values 
may be extended beyond the camera sensor by smooth extrapolation of pixel count 
1 5 values at the sensor edge whereby a smoothing operation on the count values for all 
sensor pixels is made possible. 

A smoothing operation of the count values may be performed, e.g. by spatial low- 
pass filtering of the count values, e.g. by calculation of a moving average of a 51 • 51 
pixel square. The size of the smoothing filter window, e.g. the averaging square, is 
20 dependent on the geometrical distortion of the sensor system. The lass distortion, the 
smaller the filter window may be. 

Preferably, the low-pass filtering is repeated twice. 

Preferably, the extended count values for virtual pixels created beyond the camera 
sensor are removed upon smoothing. 

25 The calibration procedure is repeated for two distances between the system and the 
screen so that the optical axes of the cameras or the virtual. e.g. mirrored, cameras 
shown in Rg. 2 may be determined. It should be noted that the images In the (virtual) 
cameras of the respective intersections of the optical axes with the screen does not 
move relative to the camera sensor upon displacement along the z-axis of the 

30 system in relation to the screen. Thus, upon displacement, the two unchanged pixels 
are determined whereby the optical axes of the (virtual) cameras are determined. 
The position of the optical centre of each (virtual) camera Is determined by 
calculation of intersections of line of sights from calibration Image pixels equldistantly 
surrounding the intersection of the respective optical axis with the screen. An 



10 



t 01>05 2003 TOR 16:21 FAX +45 33970071 ALB I HNS A/S 11011/02$ 



average of calculated intersections may be formed to constitute the z-value of the 
optical centre of the (virtual) camera In question. 

Knowing the 3D-position of the optical centre of the (virtual) cameras, the line of 
sights of each of the camera pixels may be determined. 

5 In the illustrated embodiment, the optical axis of the camera is horizontal. However, 
In certain applications, it may be advantageous to incline the optical axis with respect 
to a horizontal direction, and position the system at a high position above floor level. 
Hereby, the measurement volume of the system may cover a larger area of the floor 
or ground. For example, the optical axis of the camera (and the system) may be 
10 Inclined 23°. 

It is relatively easy to adjust the tables to this tilt of the x-axis of the system. 
Preferably, the y-axis remains horizontal. 

There are many ways to extract features from a pair of stereo images, this effect how 
the image Is processed. Here we assume that we want to detect the major 

15 movements of a single person in the field of view. We can use detection of the skin 
and the colour of some objects attached to the person [CJ. We can Instrument the 
person with a set of colours attached to the major joints of the body. By determining 
at each Instance the position of these features (skin and colours) we have say 13 
points in each part of the stereo Image. The detection of skin follows a well-known 

20 formula where the calculation Is performed on each pixel, cf. D. A. Forsyth and M. M. 
Fleck: -Automatic detection of human nudes", Wuwer Academic Publishers. Boston. 
The calculation Is a Boolean function of the value of the colours red. green and blue. 
RGB (C.2J. We use the same calculation for detection of skin for detection of colours. 
Just with other parameters. 

25 so we obtain for each feature a picture of trutlwalues, the feature exists or not for 
each pixel. Since the objects of Interest, skin and colours, normally have a certain 
size we can find areas of connected pixels with the same tiuth-value for each feature, 
called blobs (C.3J. we calculate the position of the centre of each WobIC.5]. Since we 
want to determine the 3D position of each object the blobs should come In pairs, one 

30 blob In each of the stereo (mages. We establish a relation between blobs in order to 
test if the pairing Is feasible [C.4J. The pairing is feasible if there is a corresponding 
blob in the other stereo image within a certain distance from the original blob. If the 
pairing is feasible in both directions, then we assume that the blobs belong to an 
object and we use the position of the pair of blobs to determine the position In 3D by 

35 triangulatlon. 
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The calculation of the 3D position assumes that the geometry of the camera and 
optical front-end is known [Dj. 

The basis for the triangulation is the distance between the optical centres of the 
mirror Images of the camera. If a point Is seen in both parts of the stereo Image the 
5 position relative to the camera setup can be calculated, since the angles of the rays 
between the point and the optical centres are obtained from the pixels seeing the 
point If the camera is ideal, i.e. there is no geometrical distortion then the angles for 
each pixel relative to the optical axis of each of mirror images of the camera can be 
determined by the geometry of the optical front-end system, i.e. in the case of mirrors 
1 0 by determining the apparent position and orientation of the camera. While it is not 
necessary for the functioning of such a system to position the mirror images on a 
horizontal line, this is often done, since it seams more natural to human beings to 
orient the system in the way we see it. If the camera is ideal, the above calculation 
can be done for each pair of blobs, but it is more efficient in a real time application to 
15 have one or more tables and look up values, that can be calculated on beforehand 
[DA]. If the tables were organised as if we have two tdeal cameras, with the optical 
axis normal to the line between the two optical centres this would further simplify the 
calculations, since we could place the value of the tangent function of the angle, 
which is required in the calculation, instead of the actual angle in the table. 
20 So in principle we now have 13 points in 3D, related to the set of colours of the 

objects. In practice the number of points can be differing from 13, since objects can 
be obscured from being seen in both images of the stereo pair. Also background 
objects and illumination can contribute to more objects, i.e. an object representing 
the face is split in two blobs due to the use of spectacles, a big smile or beard. This 
can also happen If the colours chosen are not discriminated well enough. This means 
that it is necessary consolidate the blobs. Blobs belonging to objects In the 
background can be avoided by controlling the background colours and illumination, 
or sorted out by estimating and subtracting the background in the images before the 
blobs are calculated, or the blobs can be disregarded since they are out of the 
30 volume where the person is moving. 

in order to consolidate the 3D points tracking [EJ Is used, blobs are formatted [D.2J 
and send to a tracker. This Is a similar task to tracking the planes on radar in a flight 
control centre. The movements of points are observed overtime. 

This is done by linear Kaiman filtering and consists of target state estimation and 
35 prediction. 
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Hypothesis of points in time belonging to the same track is formed and if the 
hypothesis is consistent with other knowledge, then we can label the track [E.4J. We 
know here that we are tracking the movements of a person, represented by 13 
objects. 

5 If all of the objects had a different colour, then it would be simple to label the targets 
found, since each colour would correspond to a joint in our model of the person. We 
do not have enough colours we can discriminate and also the colour of the skin of the 
hands and the head Is similar. We know for each joint what colour we expect. With 
that knowledge and also knowledge of the likely movements of the person we 

10 formulate some heuristics that can be used for target association [E.1], and/or 
labelling [E.4J. if we have the same colour for say, the left ankle, the right hip and 
right shoulder, and we know that the person Is standing or sitting. Then the heuristic 
could be that the shoulder is above the hip and the hip Is above the angle. When the 
situation occurs that we have exactly three targets satisfying that heuristic then we 

15 will label the targets accordingly. 

We now have a model of a person described by 13 points in 3D, i.e. we know the 
positions of all the major joints of the person in absolute coordinates relative to the 
optical system. If the position and orientation of the optical system is known, then 
these positions can be transformed to say the coordinates of the room. So we know 

20 at each Instance where the person Is In the room and the pose of the person - if the 
person is seen In both parts of the stereo image and the pose are within our 
assumed heuristics. There are many possible uses for such a system; but often we 
are also Interested in knowing the movements relative to the person, independent of 
where the person is situated in the room. In order to achieve this independence of 

25 the position we fit an avatar to the above model of the person IF]. An avatar is a 

hierarchical data structure, representing a person. In our case the avatar is simplified 
to a skeleton exhibiting the above 13 major Joints. Each Joint can have up to 3 
possible axes of rotation. The root of the hierarchal structure is the pelvis. The 
position and orientation of the pelvis Is measured in absolute coordinates relative to 

30 the camera system. The angles of rotation and the length of the bones of the 

skeleton determine all the positions of the 1 3 joints. Since the bones are fixed for a 
given person the pose of the person is determined by the angles of the Joints. 
Unfortunately the function from pose to angles Is not monotonic. a set of angles 
uniquely determines one pose; but one pose does not have a unique set of angles. 

35 So unless suitably restricted the angles cannot be used as a measure of the pose. To 
overcome this problem we have added an observation system [G], such that the 
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angies observed exhibits the required monotony. Since not all joints have 3 degree of 
freedom we do not have 39 measures for angles, but only 31 . Using these angles 
and the position and orientation of the pelvis we can at any given instant determine 
the pose of the person. 

5 An application of such a system can for example be to analyse the movements of a 
handicapped person performing an exercise for rehabilitation purposes. If an expert 
system is used then we could compare the movements to predetermined exercises 
or gestures. The expert system could be based on a neural network, which is trained 
to recognise the relevant exercise or gesture. We have chosen a different approach. 
10 using physiotherapeutic knowledge to which of the angles will vary for a correct 
exercise and which should be invariant The advantage of this approach is mainly 
that it is much faster to design an exercise than to obtain the training data for the 
neural network by measuring and evaluating a given exercise for e.g. 100 or more 
different persons. 

1 5 The variations of the angles during an exercise can be used to provide feedback to 
the person doing the exercise both at the moment a wrong movement Is detected 
and ff the exercise is executed well. The feedback can be provided by sounds, music 
or visually. One could imagine that the movements in the exercise are used to control 
a computer game, in such a way that the movements of the person are controlling 

20 the actions in the game, mapping the specific movements to be trained to the 
controls. 

All of the above is well known, although still difficult to implement effectively. What we 
claim Is that such a system can be used as a new human computer interface, HCI, In 
general. The detailed mapping of the movements to the controls required depends on 
the application. If the system is used to control say, a game, the mapping most likely 
should be as natural as possible, for Instance to perform a kick or a Jump would give 
the same action in the game. To point at something pointing with the hand and the 
arm could be used, but it is also possible to include other physical objects in the 
scene, e.g. a coloured wand and use this for pointing purposes. The triggering of an 
30 action, when pointing at something can be done by movement of another body part 
or simply by a spoken command. 

While the present system requires even illumination and special patches of colour In 
the clothing, it is known how to alleviate these requirements. For example using the 
3d information more extensively to make depth maps and volume fitting of the parts 
35 of the body of the avatar. Or using an avatar, which is much more detailed similar to 
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the person in question with skin and clothing and the fitting views of that avatar from 
two virtual cameras positioned in the same way relative to the avatar as the person to 
the two mirror images of the real camera. The pose of the avatar Is then manipulated 
to obtain the best correlation of the virtual pictures to the real pictures. The above 
5 descriptions use spatial information but the use of temporal information just as 
relevant. For example assuming that the camera is stationary the variation in 
intensity and colour from the previous picture for a given pixel is representing either a 
movement or an illumination change, this can be used to discriminate the person 
from the background, building up an estimate of the background picture. Also 
1 0 detecting the movements reduces the processing required, since any object not 
moving can be assumed to be at the previous determined position. So instead of 
examining the whole picture for features representing objects, we can limit the search 
to the areas where motion is detected. 
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CLAIMS 

1. An electronic system for determining three-dimensional positions within a 
measuring volume, comprising 

at least one electronic camera for recording of at least two images with different 
5 viewing angles of the measuring volume, 

an electronic processor that is adapted for real-time processing of the at least two 
images for determination of three-dimensional positions In the measuring volume of 
selected objects in the images. 

2. An electronic system according to claim 1, comprising 

10 one electronic camera for recording images of the measuring volume, and 

an optical system positioned In front of the camera for Interaction with light from the 
measuring volume in such a way that the at least two images with different viewing 
angles of the measuring volume are formed In the camera. 

3. An electronic system according to claim 1 or 2, wherein the processor Is further 
1 5 adapted for recognizing predetermined objects. 

4. An electronic system according to claim 3, wherein the processor is further 
adapted for recognizing body parts of a human body. 

5- An electronic system according to claim 4, wherein three-dimensional positions of 
body parts are used for computer control. 

20 6. An electronic system according to claim 4, wherein three-dimensional movements 
of body parts are used for computer control. 

7. An electronic system according to any of the preceding claims, wherein the 
processor is further adapted for recognizing colour patches worn by a human object 
in the measuring volume. 

25 B. An electronic system according to any of the preceding claims, wherein the 

processor is further adapted for recognizing retro-reflective objects worn by a human 
object in the measuring volume. 

9. An electronic system according to any of the preceding claims* wherein the 
processor is further adapted for recognizing exposed parts of a human body by 
30 recognition of human skin. 
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10. An electronic system according to any of the preceding claims, wherein the 
processor Is further adapted for recognizing colors by table look-up. the table entries 
being color values of a color space, such as RGB-values. 

1 1. An electronic system according to any of claims 4-10, wherein the processor is 
5 further adapted for determining three-dimensional positions of body parts in relation 

to each other. 

12. An electronic system according to claim 11, wherein the processor is further 
adapted for determining human body joint angles, 

13. An electronic system according to any of claims 4-12, wherein the processor Is 
10 further adapted for determining performance parameters related to specific body 

positions. 

14. An electronic system according to any of claims 13, wherein the processor Is 
further adapted for determining performance parameters of specific human 
exercises. 

15 1 5. An electronic system according to claim 14, wherein at least some of the 
performance parameters are physiotherapeutic parameters. 

16. An electronic system according to any of claims 13-15, wherein the processor is 
further adapted for providing a specific output In response to the determined 
performance parameters. 

20 17. An electronic system according to claim 16, further comprising a display for 
displaying a visual part of the output 

18. An electronic system according to claim 15 or 16, further comprising a sound 
transducer for emitting a sound part of the output 

19. An electronic system according to any of the preceding claims, wherein the 
25 optical system comprises mirrors for re-directing light from the measuring volume 

towards the camera. 

20. An electronic system according to any of the preceding claims, wherein the 
optical system comprises prisms for re-directing light from the measuring volume 
towards the camera. 

30 21 , An electronic system according to any of the preceding claims, wherein the 
optical system comprises diffractive optical elements for re-directing light from the 
measuring volume towards the camera. 
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22. An electronic system according to any of the preceding claims, wherein the 
optical system is symmetrical about a symmetry plane and the optical axis of the 
camera substantially coincides with the symmetry plane. 

23. A combined system comprising at least two systems according to any of the 
5 preceding claims, having overlapping measurement volumes. 

24. A method of calibrating a system according to any of the preceding claims, 
comprising the steps of 

positioning of a screen in the measuring volume of the system, 

projecting a calibration image with known geometrical features onto the screen, 

10 for specific calibration image pixels, determining the corresponding two Image pixels 
in the camera, and 

calculating the line of sight for substantially each pixel of the camera sensor. 

25. A method according to claim 24, wherein the calibration image is generated by a 
projector with at least ten times less geometrical distortion than the system. 

15 26. A method according to claim 24 or 25, wherein the calibration image is a biack 
and white image. 

27. A method according to claim 26, wherein the calibration image comprises one 
black section and one white section divided by a horizontal line. 

28. A method according to any of claims 23-27, wherein the calibration image 
20 comprises one black section and one white section divided by a vertical line. 

29. A method according to any of claims 23-28, wherein the step of projecting a 
calibration image comprises sequentially projecting a set of calibration images onto 
the screen. 
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advantage of using a single camera to obtain stereo images is 
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